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ABSTRACT 

a. 

Due  to  the  large  number  of  product,  project  and 
people  parameters  which  impact  large  custom  software 
development  efforts,  measurement  of  software  product 
quality  is  a  complex  undertaking.  Furthermore,  the  ab¬ 
solute  perspective  from  which  quality  is  measured  (cus¬ 
tomer  satisfaction)  is  intangible.  While  we  probably 
can’t  say  what  the  absolute  quality  of  a  software  product 
is,  we  can  determine  the  relative  quality,  the  adequacy 
of  this  quality  with  respect  to  pragmatic  considerations, 
and  identify  good  and  bad  trends  during  development. 
While  no  two  software  engineers  will  ever  agree  on  an 
optimum  definition  of  software  quality,  they  will  agree 
iiiiat  the  most  important  perspective  of  software  qual¬ 
ity  is  its  ease  of  change.  We  can  call  this  flexibility, 
adaptability  or  some  other  vague  term,  but  ihe  critical 
characteristic  of  software  is  that  it  is  soft.  The  easier 
the  product  is  to  modify,  the  easier  it  is  to  achieve  any 
other  software  quality  perspective.  , 

This  paper  presents  objective  quality  metrics  derived 
from  consistent  lifecycle  perspectives  of  rework  which, 
when  used  in  concert  with  an  evolutionary  develop¬ 
ment  approach,  can  provide  useful  insight  to  produce 
better  quality  per  unit  cost/schedule  or  to  achieve  ade¬ 
quate  quality  more  efficiently.  .Software  rework  experi¬ 
ence  with  over  1500  software  change  orders  on  the  Com¬ 
mand  Center  Processing  and  Display  System  -  Replace¬ 
ment  (CCPDS-R)  project  was  used  both  to  formulate 
the  metrics  definitions  and  to  demonstrate  their  useful¬ 
ness.  These  metrics  can  be  applied  uniformly  from  mul¬ 
tiple  perspectives  (project,  subsystem,  build,  CSCI)  to 
achieve  objective  comparability.  They  are  automated, 
consistent,  and  easy  to  use.  Along  with  subjective  inter¬ 
pretation  to  account  for  the  lifecycle  context,  objective 
insight  into  product  quality  can  be  achieved  early  where 
correction  or  improvement  can  be  instigated  more  effi¬ 
ciently. 

Index  Terms-  Evolutionary  Development,  Software 
Quality  Metrics,  Ada,  Maintainability,  Process  Im¬ 
provement. 


BACKGROUND  - - 

There  have  been  many  attempts  to  define  measures 
of  software  quality  in  the  past  20  years.  For  many  rea¬ 
sons,  none  of  these  has  caught  on  as  accepted  practice  in 
the  software  industry.  Reference  [2]  discusses  many  of 
the  problems  and  tradeoffs  associated  with  defining  and 
measuring  software  quality.  One  of  the  recurring  themes 
in  this  work  was  the  need  for  subjectivity  and  expensive 
human  r*sources  in  both  the  collection  and  interpreta¬ 
tion  of  quality  metrics.  Furthermore,  the  concept  of  a 
technology  independent  set  of  metrics,  although  an  ac¬ 
knowledged  desire,  was  not  well  understood.  Reference 
[8]  provides  an  excellent  discussion  of  the  need  for  objec¬ 
tive,  measurable  software  quality  metrics  which  remain 
technology  independent.  Reference  [9]  defines  a  com¬ 
plete  company  metrics  program  with  actual  data  that 
provides  some  valuable  experience  and  lessons  learned. 
Reference  [10]  describes  the  most  current  motivation  for 
measuring  software  quality;  software  development  pro¬ 
cess  improvement. 

After  three-l-  years  of  successful  software  develop¬ 
ment  on  the  Command  Center  Processing  and  Display 
System  -  Replacement  (CCPDS-R)  project  using  mod¬ 
ern  Ada  software  engineering  techniques  ([12],  [13]  and 
[15]),  TRW  has  derived  a  subset  of  software  quality 
metrics  which  are  measurable,  objective,  and  useful  in 
providing  a  basis  for  improving  downstream  quality  of 
products  and  processes.  One  of  the  problems  with  typ¬ 
ical  government  contracted  systems  like  CCPDS-R  is 
that  most  are  one  of  a  kind  projects.  This  characteris¬ 
tic  provides  added  complexity  to  measurement  since  the 
experience  may  be  only  partially  useful  between  differ¬ 
ent  project  domains. 

The  metrics  presented  herein  have  been  formulated 
to  be  as  useful  as  possible  while  remaining  relatively 
domain  independent  so  that  comparisons  between  dif¬ 
ferent  projects  are  possible.  This  is  not  as  simple  as 
it  sounds  and  the  literature  on  software  quality  metrics 
reinforces  this  experience.  After  many  iterations,  the 
data  presented  herein  has  demonstrated  objective  and 
valuable  insight  in  its  application  to  CCPDS-R  and  it 
provided  a  credible  basis  for  future  subsystem  planning 
as  well  as  a  starting  point  from  which  better  metrics  can 
be  derived. 
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Software  Quality  Metrics  Objectives.  Software 
quality  metrics  should  be  simple,  easy  to  use,  and  hard 
to  misuse.  They  should  be  useful  to  project  manage¬ 
ment,  stimulate  continuous  improvement  of  our  devel¬ 
opment  process,  and  low  cost  to  administer  consistently 
across  different  projects. 

Usefulness.  Conventional  testing  techniques  exist 
for  assessing  the  functionality,  reliability  and  perfor¬ 
mance  of  a  software  product,  however,  there  are  no 
accepted  methods  for  assessing  its  flexibility  {modular¬ 
ity,  changeability,  or  maintainability).  While  there  are 
many  other  perspectives  of  quality  (e.g.,  portability,  in¬ 
teroperability,  etc.),  our  experience  in  executing  an  evo¬ 
lutionary  development  process  has  demonstrated  that 
its  flexibility  aspects  are  the  most  important.  The  eas¬ 
ier  the  product  is  to  modify,  the  easier  it  is  to  achieve 
any  other  software  quality  perspective  except  perhaps 
performance.  The  tradeoff  between  flexibility  and  per¬ 
formance  is  highly  dependent  on  the  application  domain 
as  well  as  many  other  architectural  issues  and  for  the 
purposes  of  this  discussion  we  will  assume  that  perfor¬ 
mance  is  achieved  through  proper  hardware  selection 
and  that  the  project  is  prioritized  “software  first'’.  A 
...  jeci  which  is  prioritized  more  towards  performance 
(i.e.,  1750A  flight  program),  may  not  interpret  these 
metrics  in  the  same  fashion  as  a  project  prioritized 
towards  continuous  lifecycle  modification  (i.e.,  ground 
based  C®  System).  This  paper  will  attempt  to  provide 
useful,  objective  definitions  for  modularity,  changeabil¬ 
ity  and  maintainability.  The  intent  of  this  metrics  pro¬ 
gram  is  to  provide  a  mechanism  for  quantifying  both 
end-product  quality  as  well  as  in-progress  development 
trends  toward  achieving  that  quality. 

Development  Language.  Ada  has  proven  to  support 
increased  quality  and  the  evolutionary  process  model  in 
large  software  development  efforts.  Furthermore,  Ada 
appears  to  be  the  language  of  choice  for  the  majority 
of  current  and  future  large  government  projects.  While 
this  paper  assumes  that  Ada  is  the  language  for  design 
and  implementation  of  software  development  projects 
which  use  these  software  quality  metrics,  it  should  be 
straightforward  to  adapt  this  approach  to  other  lan¬ 
guages  through  a  suitable  redefinition  of  a  Source  Line 
of  Code  (SLOC). 

Development  Approach.  An  evolutionary  develop¬ 
ment  approach  as  prescribed  in  the  Ada  Process  Model 
[12]  is  necessary  to  maximize  the  usefulness  of  these  met¬ 
rics  across  a  broader  range  of  the  life  cycle.  The  met¬ 
rics  are  derived  from  controlled  configuration  baselines. 
Therefore,  an  approach  with  early  incremental  baselines 
will  see  an  increased  benefit.  As  a  prerequisite  to  under¬ 
standing  the  derivation  of  the  software  quality  metrics, 
the  following  section  provides  an  overview  of  the  Ada 
Process  Model  employed  on  CCPDS-R. 


Ada  PROCESS  MODEL 

An  Evolutionary  Process  Model  is  fundamental  to 
this  approach  for  Software  Quality  Assessment.  With¬ 
out  tangible  intermediate  products,  software  quality  as¬ 
sessment  would  be  ineffective  and  inaccurate.  Conven¬ 
tional  experience  has  repeatedly  seen  projects  sequence 
through  highly  successful  preliminary  and  critical  design 
phases  (as  perceived  by  conventional  Design  Review  as¬ 
sessment  of  design  quality)  only  to  have  the  true  quality 
problems  surface  in  the  integration  and  test  phases  with 
little  or  no  time  for  proper  resolution.  An  Evolution¬ 
ary  Process  Model  provides  a  systematic  approach  for 
achieving  early  insight  into  product  quality  and  a  uni¬ 
form  lifecycle  measure  for  its  assessment.  It  also  avoids 
the  inevitable  deg.'adations  in  quality  due  to  late  break¬ 
age  and  rapid  fixes  which  are  shoehorned  into  the  prod¬ 
uct  without  adequate  software  engineering. 

TRW’s  Ada  Process  Model  is,  in  simplest  terms,  a 
uniform  application  of  incremental  Ada  product  evolu¬ 
tion  coupled  with  a  demonstration-based  approach  to 
design  review  for  continuous  and  insightful  thread  test¬ 
ing  and  risk  management.  The  techniques  employed 
within  this  process  are  derived  from  the  philosophy  of 
the  Spiral  Model  [7]  with  emphasis  on  an  evolution¬ 
ary  design  approach.  The  use  of  Ada  as  the  life  cycle 
language  for  design  evolution  provides  the  vehicle  for 
uniformity  and  provides  a  basis  for  consistent  software 
progress  and  quality  metrics. 

TRW’s  Ada  Process  Model  recognizes  that  all  large, 
complex  software  systems  will  suffer  from  design  break¬ 
age  due  to  early  unknowns.  It  strives  to  accelerate  the 
resolution  of  unknowns  and  correction  of  design  flaws  in 
a  systematic  fashion  which  permits  prioritized  manage¬ 
ment  of  risks.  The  dominant  mechanism  for  achieving 
this  goal  is  a  disciplined  approach  to  incremental  devel¬ 
opment.  The  key  strategies  inherent  in  this  approach 
are  directly  aimed  at  the  three  main  contributors  to 
software  diseconomy  of  scale:  minimizing  the  overhead 
and  inaccuracy  of  interpersonal  communications,  elimi¬ 
nating  rework,  and  converging  requirements  stability  as 
quickly  as  possible  in  the  lifecycle.  These  objectives  are 
achieved  by: 

1.  requiring  continuous  and  early  convergence  of  in¬ 
dividual  solutions  in  a  homogeneous  life  cycle  lan¬ 
guage  (Ada). 

2.  eliminating  ambiguities  and  unknowns  in  the  prob¬ 
lem  statement  and  the  evolving  solution  as  rapidly 
as  practical  through  prioritized  development  of  tan¬ 
gible  increments  of  capability. 

Although  many  of  the  disciplines  and  techniques  pre¬ 
sented  herein  can  be  applied  to  non-Ada  projects,  the 
expressiveness  of  Ada  as  a  design  and  implementation 
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Figure  1;  New  Techniques  vs.  Conventional  Techniques 


language  and  support  for  po.iwlal  implementation  (ab¬ 
straction)  provide  a  strong  platform  for  creating  a  uni¬ 
form  approach. 

Many  of  the  Ada  Process  Model  strategies  (summa¬ 
rized  in  Figure  1)  have  been  attempted,  in  part,  on  other 
software  development  efforts;  however,  there  are  funda¬ 
mental  differences  in  this  approach  compared  to  conven¬ 
tional  software  development  models. 

Uniform  Ada  Lifecycle  Representation.  The  pri¬ 
mary  innovation  in  the  Ada  Process  Model  is  the  use 
of  a  single  language  for  the  entire  software  lifecycle,  in¬ 
cluding,  to  some  degree,  the  requirements  phase.  All  of 
the  remaining  techniques  rely  on  the  ability  to  equate 
design  with  code  so  that  the  only  variable  during  devel¬ 
opment  is  the  level  of  abstraction.  This  provides  two 
essential  benefits; 

1.  The  ability  to  quantify  units  of  software  (de- 
sign/development/test)  work  in  one  dimension, 
Source  Lines  of  Code  (SLOC).  While  it  is  certainly 
true  that  SLOC  is  not  a  perfect  absolute  measure 
of  software,  with  consistent  counting  rules,  it  has 
proven  to  be  the  best  normalized  measure  and  does 
provide  an  objective,  consistent  basis  for  assessing 
relative  trends  across  the  project  life  cycle. 

2.  A  formal  syntax  and  semantics  for  lifecycle  rep¬ 
resentation  with  automated  verification  by  an  Ada 
compiler.  Ada  compilation  does  not  provide  com¬ 
plete  verification  of  a  component.  It  does  go  a 
long  way,  however,  in  verifying  configuration  con¬ 
sistency,  and  ensuring  a  standard,  unambiguous 
representation. 

Incremental  Development.  Although  risk  manage¬ 
ment  through  incremental  development  is  emphasized 
as  a  key  strategy  of  the  Ada  Process  Model,  it  was  (or 
always  should  have  been)  a  key  part  of  most  conven¬ 
tional  models.  Without  a  uniform  lifecycle  language  as 
a  vehicle  for  incremental  design/code/test,  conventional 
implementations  of  incremental  development  were  diffi¬ 
cult  to  manage.  This  management  is  simplified  by  the 
integrated  techniques  of  the  Ada  Process  Model. 

Design  Integration.  In  this  discussion,  we  will  take 


a  simple  minded  view  of  “design”  as  the  structural  im¬ 
plementation  or  partitioning  of  software  components  (in 
terms  of  function  and  performance)  and  definition  of 
their  interfaces.  At  the  highest  level  of  design  we  could 
be  talking  about  conventional  requirements  definition, 
at  the  lowest  level,  we  are  talking  about  conventional 
detailed  design  and  coding.  Implementation  is  then  the 
development  of  these  components  to  meet  their  inter¬ 
faces  while  providing  the  necessary  functional  perfor¬ 
mance.  Regardless  of  level,  the  activity  being  performed 
is  Ada  coding.  Top  level  design  means  coding  the  top 
level  components  (Ada  main  programs,  task  e.xecutives, 
global  types,  global  objects,  top-level  library  units  , 
etc.).  Lower  level  design  means  coding  the  lower  level 
program  unit  specifications  and  bodies. 

The  postponement  of  all  coding  until  after  CDR 
in  conventional  software  development  approaches  also 
postponed  the  primary  indicator  of  design  quality:  in- 
tegrability  of  the  interfaces.  The  Ada  Process  Model  re¬ 
quires  the  early  development  of  a  Software  Architecture 
Skeleton  (SAS)  as  a  vehicle  for  early  interface  definition. 
The  SAS  essentially  corresponds  to  coding  the  top  level 
components  and  their  interfaces,  compiling  them,  and 
providing  adequate  drivers/stubs  so  that  they  can  be  ex¬ 
ecuted.  This  early  development  forces  early  baselining 
of  the  software  interfaces  to  best  effect  smooth  evolu¬ 
tion,  early  evaluation  of  design  quality  and  avoidance 
of  downstream  breakage.  In  this  process,  we  have  made 
integration  a  design  activity  rather  than  a  test  activity. 
To  a  large  degree,  the  Ada  language  forces  integration 
through  its  library  rules  and  consistency  of  compiled 
components.  It  also  supports  the  concept  of  separating 
structural  definition  (specifications)  from  runtime  func¬ 
tion  (bodies).  The  Ada  Process  Model  expands  this 
concept  further  by  requiring  structural  design  (SAS) 
prior  to  runtime  function  (executable  threads).  Demon¬ 
strations  provide  a  forcing  function  for  broader  runtime 
integration  to  augment  the  compile  time  integration  en¬ 
forced  by  the  Ada  language. 

Demonstration  Based  Design  Review.  Many  con¬ 
ventional  projects  built  demonstrations  or  benchmarks 
of  standalone  design  issues  (e.g.,  user  system  interface, 
critical  algorithms,  etc.)  to  support  design  feasibility. 


However,  the  design  baseline  was  represented  on  paper 
(PDL,  simulations,  flowcharts,  vugraphs).  These  rep¬ 
resentations  were  vague,  ambiguous  and  not  amenable 
to  configuration  control.  The  degree  of  freedom  in  the 
design  representations  made  it  very  difficult  to  uncover 
design  flaws  of  substance,  especially  for  complex  sys¬ 
tems  with  concurrent  processing.  Given  the  typical 
design  review  attitude  that  a  design  is  “innocent  un¬ 
til  proven  guilty”,  it  was  quite  easy  to  assert  that  the 
design  was  adequate.  This  was  primarily  due  to  the 
lack  of  a  tangible  design  representation  from  which  true 
design  flaws  were  unambiguously  obvious.  Under  the 
Ada  Process  Model,  design  review  demonstrations  pro¬ 
vide  some  proof  of  innocence  and  are  far  more  efficient 
at  identifying  and  resolving  design  flaws.  The  subject 
of  the  design  review  is  not  only  a  briefing  which  de¬ 
scribes  the  design  in  human  understandable  terms,  but 
also  a  demonstration  of  important  aspects  of  the  design 
baseline  which  verify  design  quality  (or  lack  of  quality). 

Total  Quality  Management  (TQM).  In  the  Ada 
Process  Model  there  are  two  key  advantages  for  applying 
TQM.  The  first  is  the  common  Ada  format  throughout 
the  lifecycle  which  permits  consistent  software  metrics 
across  the  software  development  work  force.  Although 
these  metrics  don’t  all  pertain  to  quality  (many  pertain 
to  progress),  they  do  permit  a  uniform  communications 
vehicle  for  achieving  the  desired  quality  in  an  efficient 
manner.  Secondly,  the  demonstrations  serve  to  provide 
a  common  goal  for  the  software  developers.  This  “in¬ 
tegrated  product”  is  a  reflection  of  the  complete  design 
at  various  phases  in  the  life  cycle  for  which  all  person¬ 
nel  have  ownership.  Rather  than  individually  evaluating 
components  which  are  owned  by  individuals,  the  demon¬ 
strations  provide  a  mechanism  for  reviewing  the  team’s 
product.  This  team  ownership  of  the  demonstrations  is 
an  important  motivation  for  instilling  a  TQM  attitude. 

SOFTWARE  QUALITY  METRICS 
APPROACH 

In  essence,  the  approach  we  are  taking  is  similar  to 
that  of  [8]  who  proposes  to  measure  software  quality 
through  the  absence  of  spoilage.  While  his  definitions 
are  purposely  vague  (to  remain  technology  and  project 
independent),  ours  are  quite  explicit.  The  key  to  this 
metrics  approach  is  similar  to  conventional  cost  esti¬ 
mation  techniques  such  as  COCOMO  [3]  where  quan- 
tifiability  and  consistency  of  application  are  important. 
Note  that  software  cost  estimation  has  subjective  inputs 
and  objective  outputs.  Our  approach  will  define  objec¬ 
tive  inputs  which  may  require  subjective  interpretation 
for  project  context. 

Our  primary  metric  for  software  quality  will  be  re¬ 
work  as  measured  by  changed  SLOC  in  configured  base¬ 
lines.  This  metric  will  also  need  to  be  adjusted  for 


project  context  to  accommodate  the  product  character¬ 
istics,  the  life  cycle  phase,  etc.  The  software  quality  as¬ 
sessment  derived  from  this  objective  collection  of  rework 
metrics  will  require  subjective  analysis  in  some  cases. 
The  subjectivity  here  is  in  the  fact  that  we  are  trying 
to  assess  quality  during  development  (this  requires  sub¬ 
jective  analysis)  using  the  same  metrics  used  to  assess 
quality  following  development  (objective  analysis).  For 
example,  the  volume  of  rework  following  product  deliv¬ 
ery  is  an  objective  measure  of  quality,  or  lack  of  quality. 
The  amount  of  rework  following  the  first  configuration 
baseline  during  development  is  a  subjective  measure. 
Zero  rework  might  be  interpreted  as  a  perfect  baseline 
(unlikely),  an  inadequate  test  program,  or  an  unambi¬ 
tious  first  build.  The  following  paragraphs  define  some 
of  the  foundations  in  this  approach; 

Software  Quality  Definition.  Software  quality  is  the 
degree  of  compliance  with  the  customer  expectations  of 
function,  performance,  cost  and  schedule.  This  is  an 
incredibly  difficult  concept  to  make  objective.  The 
only  mechanisms  available  for  defining  “customer  expec¬ 
tations”  are  Software  Requirements  Specifications  for 
function  and  performance,  and  an  approved  expendi¬ 
ture  plan  which  quantifies  cost  and  schedule  goals  (ba¬ 
sically,  this  corresponds  to  the  “contract”).  These  two 
mechanisms  are  traditionally  the  lowest  quality  prod¬ 
ucts  produced  by  a  project  since  they  are  required  to  be 
agreed  upon  with  numerous  unknowns  far  too  early  in 
the  lifecycle.  The  evolutionary  process  model  and  soft¬ 
ware  quality  metrics  should  provide  better  insight  into 
the  degree  of  compliance  with  customer  expectations  in 
the  above  four  perspectives. 

Software  Change  Order  (SCO).  A  Software  Change 
Order  constitutes  direction  to  proceed  with  changing  a 
configured  software  component.  This  change  may  be 
needed  to  1)  rework  a  component  with  bad  quality  (a 
fix),  or  2)  rework  a  component  to  achieve  better  quality 
(an  enhancement)  or  3)  accommodate  a  customer  di¬ 
rected  change  in  requirements.  The  difference  between 
the  first  two  types  of  rework  is  inherent  in  the  neces¬ 
sity  for  the  change.  If  the  change  is  required  for  com¬ 
pliance  with  product  specifications,  then  the  rework  is 
type  1.  If  the  change  is  desired  for  cost-effectiveness, 
increased  testability,  increased  usability,  or  other  effi¬ 
ciency  reasons  (assuming  the  unchanged  component  is 
compliant),  then  the  rework  is  type  2.  In  both  cases, 
the  rework  should  result  in  increased  end  product  qual¬ 
ity  (requirements  compliance  per  dollar),  however,  type 
1  also  indicates  inadequate  quality  in  a  current  base¬ 
line.  In  practice,  differentiating  between  type  1  and 
type  2  may  be  quite  subjective.  As  discussed  later, 
most  of  the  metrics  are  insensitive  to  the  categoriza¬ 
tion,  but  if  the  differentiation  is  consistently  applied,  it 
can  provide  useful  insight.  Conventionally,  SCOs  were 


called  Software  Problem  Reports  (SPRs).  To  avoid  con¬ 
fusion  (“problem”  has  a  negative  connotation,  and  not 
all  changes  are  necessarily  problems),  we  have  changed 
the  terminology.  The  software  quality  metrics  collec¬ 
tion  and  analysis  wdll  use  type  1  and  type  2  SCOs  in 
an  appropriate  manner.  Type  3  SCOs  need  to  be  sep¬ 
arated  since  they  do  not  reflect  any  change  in  quality, 
they  do  however,  redefine  the  customer  expectations. 
Furthermore,  Type  3  SCOs  typically  reflect  a  change 
which  is  of  more  global  impact  thereby  requiring  vari¬ 
ous  levels  of  software  and  system  engineering  as  well  as 
high  level  regression  testing.  These  types  of  SCOs  will 
not  be  used  in  these  metrics  due  to  this  wide  range  of 
variability.  Rather,  the  data  derived  from  type  1  and 
type  2  SCOs  should  provide  a  solid  basis  for  estimating 
maintainability  and  the  effort  requited  for  type  3  SCOs. 

Source  Lines  of  Code  (SLOC).  There  has  always 
been  a  controversy  as  to  whether  SLOC  provides  a 
good  metric  for  measuring  software  volume  (DeMarco 
calls  this  bang).  [11]  identifies  some  of  the  precau¬ 
tions  necessary  when  dealing  with  SLOC.  Upon  reading 
open  literature  which  discusses  project  productivities 
(SLOC/MM),  it  is  easy  to  see  that  there  is  little,  if  any, 
comparability  between  projects  within  the  same  com¬ 
pany  no  less  projects  from  different  companies.  [4]  iden¬ 
tifies  the  pros  and  cons  of  various  measures  and  comes 
to  the  conclusion  that  there  is  nothing  better.  Every¬ 
one  agrees  however,  that  whatever  one  uses,  it  must  be 
defined  objectively  and  consistently  to  be  of  value  for 
comparison.  How  we  define  the  absolute  unit  of  SLOC 
is  not  as  important  as  defining  it  consistently  across  all 
projects  and  all  areas  of  a  specific  project.  Therefore, 
the  preferred  way  to  define  a  SLOC  is  the  following: 

The  number  of  SLOC  for  a  given  set  of  Ada 
program  units  is  defined  as  the  output  of  a 
SLOC  Counting  Tool. 

Enforcing  this  definition  is  simple  to  achieve  by  pro¬ 
viding  a  portable  tool.  By  accepting  certain  non- 
;ontroversial  and  simple  standards  for  program  unit 
readers  and  program  layout  the  tool  can  provide  more 
valuable  outputs  than  simply  SLOC  counts  (e.g.,  static 
lierarchies,  and  complexity  ratings). 

Ada/COCOMO  [5],  [6]  defines  SLOC  for  Ada  pro- 
;rams  as:  Within  an  Ada  specification  part,  each  car- 
iage  return  counts  as  one  SLOC.  Specifications  shall 
le  coded  with  the  following  standards  (rationale  is  pro- 
ided  in  iialica): 

1.  each  parameter  of  a  subprogram  declaration  be 
listed  on  a  separate  line  ( The  design  of  a  subpro¬ 
gram  interface  is  done  in  one  place  and  generally 
the  effort  associated  with  the  interface  design  is  de¬ 
pendent  on  the  number  of  parameters.) 


2.  for  custom  enumeration  types  (e.g.,  system  state, 
socket  names,  etc.)  and  record  types  each  enumer¬ 
ation  or  field  should  be  listed  on  a  separate  line. 
[Custom  types  usually  involve  custom  design  and 
engineering,  hence  an  increased  number  of  SLOC.) 

3.  for  predefined  enumeration  types  (e.g.,  keyboard 
keys,  compass  directions),  enumerations  should  be 
listed  on  as  few  lines  as  possible  without  loss  of 
readability.  ( These  kinds  of  types  generally  require 
no  custom  engineering.) 

4.  Initialization  of  composite  objects  (e.g.,  records  or 
arrays)  should  be  listed  with  one  component  per 
line.  [Frequently,  each  of  these  assignments  rep¬ 
resents  a  custom  statement,  an  others  clause  is 
typically  used  for  the  non-custom  assignments.) 

Within  Ada  bodies  each  semi-colon  counts  as  one  SLOC. 
Generic  instantiations  count  one  line  for  each  generic 
parameter  (spec  or  body). 

The  definition  above  treats  declarative  (specifica¬ 
tion)  design  much  more  sensitively  than  it  does  exe¬ 
cutable  (body)  design.  It  also  does  not  recognize  the 
declarative  part  of  a  body  as  the  same  importance  as 
a  specification  part.  Although  these  and  other  debates 
can  surface  with  respect  to  the  “optimum”  definition 
of  a  SLOC,  the  optimum  absolute  definition  is  far  less 
important  than  a  consistent  relative  definition. 

Quality  Control  Board.  The  QCB  constitutes  the 
governing  body  responsible  for  autho  zing  changes  to  a 
configured  baseline  product  (conventionally  known  as  a 
configuration  control  board  -  CCB).  This  body  is  com¬ 
posed,  at  a  minimum,  of  the  development  manager, 
customer  representative,  each  product  manager,  sys¬ 
tems  effectiveness  representative  and  the  test  manager. 
The  QCB  decides  on  all  proposed  changes  to  configured 
products  and  approves  all  SCOs.  The  QCB  is  respon¬ 
sible  for  collecting  the  Software  Quality  metrics,  objec¬ 
tively  and  subjectively  analyzing  trends,  and  proposing 
changes  to  the  development  process,  tools,  products  or 
personnel  to  improve  future  quality. 

Configured  Baseline.  A  configured  baseline  consti¬ 
tutes  a  set  of  products  which  are  subjected  to  change 
control  through  a  Quality  Control  Board  (QCB).  Con¬ 
figured  baselines  usually  represent  intermediate  prod¬ 
ucts  which  have  completed  design,  development,  and 
informal  test  and  final  products  which  have  completed 
formal  test. 

Metrics  Derivation 

The  remainder  of  this  paper  provides  substantial  de¬ 
tail  in  the  definition  and  description  of  the  necessary 
statistics  to  be  collected,  the  metrics  derived  from  these 
statistics  and  their  interpretation.  This  section  provides 


a  simple  overview  of  how  these  metrics  were  derived,  the 
necessity  of  some  of  the  collected  statistics  and  their  rai¬ 
son  d'etre.  The  following  derivations  are  not  an  obvious 
top  down  progression,  rather,  they  resulted  from  sub¬ 
stantial  trial  and  error,  numerous  dead  end  analyses, 
intuition  and  heuristics. 

The  fundamental  hypothesis  was  that  their  was  sig¬ 
nificant  information  content  in  the  character  of  rework 
being  performed  over  the  project  lifecycle.  The  obvi¬ 
ous  raw  statistics  to  collect  include  number  and  type 
of  software  changes,  SLOG  damaged,  and  SLOG  fixed. 
The  problem  was  to  find  the  right  filtering  techniques 
for  the  raw  rework  statistics  which  identify  useful  trends 
and  to  uncover  objective  measures  which  quantify  prod¬ 
uct  attributes  both  during  development  and  as  an  end- 
product.  Our  original  intent  was  to  provide  a  quantifi¬ 
cation  of  the  product’s  modularity,  changeability,  and 
maintainability.  The  first  two  are  intuitively  simple  to 
define  as  a  function  of  rework,  the  third  is  mote  subtle; 

Modularity  (Qrnod):  The  average  extent  of  breakage. 
This  identifies  the  need  to  quantify  extent  of  break¬ 
age  (we  will  use  volume  of  SLOG  damaged)  and 
number  of  instances  of  rework  (Number  of  SCOs). 
In  effect  we  are  defining  modularity  as  a  measure 
of  breakage  localization. 

Changeability  {Qc)'-  The  average  complexity  of 
breakage.  This  identifies  the  need  to  quantify  com¬ 
plexity  of  breakage  (we  will  use  effort  required  to 
resolve)  and  number  of  instances  of  rework  (Num¬ 
ber  of  SCOs). 

Maintainability  {Qm)'  Theoretically  the  maintain¬ 
ability  of  a  product  is  related  to  the  productiv¬ 
ity  with  which  the  maintenance  team  can  operate. 
Productivities  however,  are  so  difficult  to  compare 
between  projects  that  this  definition  was  intuitively 
unsatisfying.  If  we  ratio  the  productivity  of  rework 
to  the  productivity  of  development,  we  end  up  with 
a  value  which  is  independent  of  productivity  but  yet 
a  reflection  of  the  complexity  to  change  a  product  in 
relation  to  the  complexity  to  develop  it.  This  nor¬ 
malizes  out  the  project  productivity  differences  and 
provides  a  relatively  comparable  metric.  Maintain¬ 
ability  then,  will  be  defined  as  the  ratio  of  rework 
productivity  and  development  productivity.  Intu¬ 
itively,  this  vsJue  identifies  a  product  which  can  be 
changed  three  times  as  efficiently  (Qm  =  -33)  as  it 
was  developed  as  having  a  better  (lower)  maintain¬ 
ability  than  a  product  that  can  be  changed  twice  as 
efficiently  (Qm  =  -5)  as  it  was  developed,  indepen¬ 
dent  of  the  absolute  maintenance  productivity  real¬ 
ized.  The  statistics  needed  to  compute  these  values 
are  the  total  development  effort,  total  SLOG,  total 
rework  effort  and  total  reworked  SLOG. 


While  the  values  above  provide  useful  end-product 
objective  measures,  their  intermediate  values  as  a  func¬ 
tion  of  time  would  also  provide  insight  during  the  devel¬ 
opment  process  into  the  expected  end-product  values. 
Furthermore,  once  we  have  gained  some  experience  with 
maintenance  of  early  increments,  this  experience  should 
be  useful  for  predicting  the  rework  inherent  in  remaining 
increments. 

The  above  brief  derivation  is  starting  to  push  the 
limits  of  our  first  goat  (simplicity)  and  the  following 
sections,  on  the  surface,  will  appear  to  be  somewhat 
complex.  A  few  remarks  about  this  are  in  order.  First, 
there  will  always  be  a  tradeoff  between  simplicity  and 
real  insight.  Surface  insight  is  usually  attained  very 
simply,  detailed  insight  requires  added  knowledge  and 
complexity.  We  have  chosen  a  set  of  metrics  which  range 
from  simple  to  moderately  complex  to  cover  the  multiple 
perspectives  needed  by  project  management  to  ensure 
accuracy.  It  is  not  necessary  to  deal  with  these  met¬ 
rics  as  a  complete  set.  Subsets,  or  different  sets  are  also 
useful.  Secondly,  most  of  the  analysis,  mathematics  and 
data  collection  inherent  in  these  metrics  should  be  auto¬ 
mated  so  that  managers  need  only  interpret  the  results 
and  understand  their  basis. 

The  above  values  were  determined  through  exten¬ 
sive  analysis,  trial  and  error,  and  intuition.  There  ate 
certainly  other  metrics  derivable  from  rework  statistics 
which  would  also  provide  useful  insight.  The  following 
sections  provide  more  detailed  descriptions  and  nota¬ 
tions  for  the  collected  statistics  (Table  1),  in-progtess 
indicators  (Table  2),  and  end-product  quality  metrics 
(Table  3).  Hypothetical  expectations  are  provided  in 
Figure  2  for  the  in-progress  indicators  and  collected 
statistics. 

Collected  Statistics 

Table  1  identifies  the  necessary  statistics  which  must  be 
collected  over  the  lifecycle  to  implement  our  proposed 
metrics. 

Total  Source  Lines  The  SLOCt  metric  tracks  the 
estimated  total  size  of  the  product  under  devel¬ 
opment.  This  value  may  change  significantly  over 
the  life  of  the  development  as  early  requirements 
unknowns  are  resolved  and  as  design  solutions  ma¬ 
ture.  This  total  should  also  include  reused  software 
which  is  part  of  the  delivered  product  and  subject 
to  contractor  maintenance. 

Configured  SLOC  This  metric  simply  tracks  the 
transition  of  software  components  from  a  ma¬ 
turing  design  state  into  a  controlled  configura¬ 
tion.  For  any  given  project,  this  metric  will  pro¬ 
vide  insight  into  progress  and  stability  of  the  de¬ 
sign/development  team.  [12]  discusses  some  of  the 


Statistic 

Definition 

Total  SLOC 

SLOCt  =  Total  Product  SLOC 

Configured  SLOC 

SLOCc  =  Standalone  Tested  SLOC 

Errors 

SCOi  =  No.  of  Open  Type  1  SCOs 
scot  =  No.  of  Closed  Type  1  SCOs 
5COi  =  No.  of  Type  1  SCOs 

Improvements 

SCO2  =  No.  of  Open  Type  2  SCOs 
SCOI  =  No.  of  Closed  Type  2  SCOs 
SCO-2  =  No.  of  Type  2  SCOs 

Open  Rework 

Bi  =■  Damaged  SLOC  Due  to  SCOf 

B2  =Damaged  SLOC  Due  to  SCO^ 

Closed  Rework 

Fi  =SLOC  Repaired  after  SCOI 

Fj  =SLOC  Repaired  after  SCOj 

Total  Rework 

At  =  Fi  -1-  Bi 

Rj  =  F,  +  B2 

Table  1:  Collected  Raw  Data  Definitions 

tradeoffs  and  risk  management  philosophy  inher¬ 
ent  in  laying  out  an  incremental  build  approach. 
For  projects  with  reused  software,  there  will  be  an 
early  contribution  to  SLOCc  and  thus  “immediate 
progress”  and  quality  metrics  as  defined  below. 

Errors  Real  errors  (type  1  SCOs)  constitute  an  impor¬ 
tant  metric  from  which  many  of  the  following  are 
derived.  The  expectation  is  that  the  highest  in¬ 
cidence  of  uncovering  errors  happens  immediately 
after  the  turnover  and  decreases  with  time  (i.e.,  the 
software  matures). 

Improvements  The  other  stimulus  for  changing  a 
baseline,  improvements  (type  2  SCOs),  are  also  key 
to  the  assessment  of  quality  and  progress  towards 
producing  quality.  The  expectation  for  improve¬ 
ments  is  approximately  inversely  proportional  to 
errors,  in  that  as  the  error  rate  starts  off  high  and 
damps  out,  the  Improvements  start  off  low  (the  fo¬ 
cus  is  on  errors)  and  increase.  This  phenomenon 
is  basically  derived  from  the  assumption  that  a 
fixed  team  is  working  the  Test/Maintenance  pro¬ 


gram  and; 

Effortgrror*  E / f Ort Improw ement j  ^  CoTlStttTlt 

The  actual  differentiation  between  Type  1  and 
Type  2  is  somewhat  subjective.  The  metrics  de¬ 
fined  herein  are  not  particularly  sensitive  to  either 
type  since  they  rely  on  the  sum  of  the  impacts  from 
both  types.  However,  the  difference  between  Type 
1  damage  and  Type  2  damage  may  provide  useful 
insight  as  demonstrated  on  CCPDS-R. 

Open  Rework  Theoretically,  all  rework  corresponds 
to  an  increase  in  quality.  Either  the  rework  is 
necessary  to  remove  an  instance  of  “bad”  quality 
(5COi),  or  to  enhance  a  component  for  life  cycle 
cost  effectiveness  (SCO2).  The  dynamics  of  the 
rework  coupled  with  the  project  schedule  context 
must  be  evaluated  to  provide  an  accurate  assess¬ 
ment  of  quality  trends.  A  certain  amount  of  rework 
is  a  necessity  in  a  large  software  engineering  effort. 
In  fact,  early  rework  is  considered  a  sign  of  healthy 
progress  in  the  evolutionary  process  model.  Con¬ 
tinuous  rework,  late  rework,  or  zero  rework  due  to 
the  non-existence  of  a  configured  baseline  are  gen¬ 
erally  indicators  of  negative  quality.  Interpretation 
of  this  metric  requires  project  context.  In  general 
however,  the  rework  must  ultimately  go  to  zero  at 
product  delivery.  In  order  to  provide  a  consistent 
and  automatable  collection  process,  rework  is  de¬ 
fined  as  the  number  of  SLOC  estimated  to  change 
due  to  an  SCO.  The  absolute  accuracy  of  the  esti¬ 
mates  is  generally  unimportant  and  since  open  re¬ 
work  is  tracked  with  an  estimate  and  closed  rework 
(see  below)  is  tracked  separately  with  actuals,  the 
values  continually  correct  themselves  and  remain 
consistent. 

Closed  Rework  Whereas  the  breakage  metrics  esti¬ 
mated  the  damage  done,  the  repair  metrics  should 
identify  the  actual  damage  which  was  fixed.  Upon 
resolution,  the  corresponding  breakage  estimate 
should  be  updated  to  reflect  the  actual  required  re¬ 
pair  that  remains  in  the  baseline.  The  actual  SLOC 
fixed  will  clearly  never  be  absolutely  accurate.  It 
will,  however,  be  relatively  accurate  for  assessing 
trends  inherent  in  these  metrics.  Since  fixed  can 
take  on  several  different  meanings  depending  on 
what  is  added,  deleted  and  changed,  a  consistent 
set  of  guidelines  is  necessary.  Changed  SLOC  will 
increase  Rt  without  a  change  to  SLOCc-  Added 
code  will  increase  Ri  and  SLOCc,  although  not 
necessarily  in  the  same  proportion.  Deleted  code 
(not  typically  a  problem)  with  no  corresponding 


addition  could  reduce  both  Ri  and  SLOCc-  A  con¬ 
ventional  differences  tool  with  an  appropriate  pre¬ 
processor  which  converts  properly  formatted  source 
files  into  a  format  which  contains  no  comments  and 
1  SLOG  per  compared  record  would  be  the  best 
method  for  computing  changed  SLOG.  A  simpler 
method  (and  the  one  used  here)  would  be  to  simply 
estimate  the  magnitude  of  the  fixed  SLOG.  Given 
the  volume  of  changes  and  the  need  for  only  roughly 
accurate  data  for  identifying  trends,  the  accuracy 
of  the  raw  data  is  relatively  unimportant. 

In-Progress  Indicators 


Rework  Stability  The  difference  between  total  re¬ 
work  and  closed  rework  provides  insight  into  the 
trends  of  resolving  issues.  The  important  use  of 
this  metric  is  to  ensure  that  the  breakage  rate  is 
not  outrunning  the  resolution  rate.  Figure  2  Iden¬ 
tifies  an  idealized  case  where  the  resolution  rate 
does  not  diverge  (except  for  short  periods  of  time) 
from  the  breakage  rate.  Note  also  that  the  break¬ 
age  rate  somewhat  tracks  the  SLOCc  delivery  rate. 
A  diverging  value  of  SS  would  indicate  instability 
of  rework  activities.  A  stable  value  of  SS  would 
indicate  systematic  and  straightforward  resolution 
activities. 


Table  2  defines  the  in-progtess  indicators  and  Fig¬ 
ure  2  identifies  relative  expectations.  It  is  difficult  to  de¬ 
fine  the  absolute  expectations  for  the  in-progress  metrics 
without  comparable  data  from  other  projects.  Relative 
expectations  are  described  in  the  following  paragraphs. 


Indicator 

Definition 

Rework  Ratio 

-  SLOCc 

Rework  Backlog 

~  SLOCc 

Rework  Stability 

SS  =  [Ri  +  R2)  -  (F,  +  Fj) 

Table  2:  In  Progress  Indicator  Definitions 


Rework  Ratio  The  sum  of  the  currently  broken  prod¬ 
uct  (Bi  -)-  B2)  and  the  already  repaired  breakage 
{Fi  -f  F2)  corresponds  to  the  mass  of  the  cur¬ 
rent  product  baseline  which  has  needed  rework 
(Ri  -f  R2).  The  rework  ratio  (RR)  identifies  the 
current  ratio  of  SLOCc  which  is  expected  to  un¬ 
dergo  rework  prior  to  maturity  into  an  end  product. 
The  expectation  for  RR  shown  in  Figure  2  is  to  in¬ 
crease  to  a  stable  value  with  minor  discontinuities 
following  the  initial  delivery  of  each  build. 

Rework  Backlog  The  current  backlog  of  rework  is 
defined  as  the  percentage  of  the  current  SLOCc 
which  is  currently  in  need  of  repair.  In  general,  one 
would  expect  that  the  rework  backlog  should  rise  to 
some  level  and  remain  stable  through  the  test  pro¬ 
gram  until  it  drops  off  to  zero.  Large  changes  from 
month  to  month  should  clearly  be  investigated. 


Figure  2:  In-Progress  Indicators  Example  Expectations 


End-Product  Quality  Metrics 

The  end-product  metrics  reflect  insight  into  the 
maintainability  of  the  software  products  with  respect 


to  type  1  and  type  2  SCOs.  Type  3  SCOs  are  explic¬ 
itly  not  included  since  they  redefine  the  inherent  target 
quality  of  the  system  and  tend  to  require  more  global 
system  and  software  engineering  as  well  as  some  ma¬ 
jor  re-verification  of  system  level  requirements.  Since 
these  types  of  changes  are  dealt  with  in  extremely  di¬ 
verse  ways  by  different  customers  and  projects,  they 
would  tend  to  cloud  the  meanings  and  comparability  of 
the  data.  However,  the  metrics  data  below  should  be 
very  helpful  in  determining  and  planning  the  expected 
effort  for  implementing  type  3  SCOs. 


Metric 

Defiiiition 

Rework  Proportions 

r»  ( 

"5  -  SLOCr.i.i 

Modularity 

Vrnod  —  SCOi+SCOi 

Changeability 

^  E{  {ariscOi+Et  f<rTtsco-, 

wC  —  sco,  +  sco. 

Maintainability 

^Af  =  If 

Table  3;  End-Product  Quality  Metrics  Definitions 

ReworV-  Proportions  The  Re  value  identifies  the 
percentage  of  effort  spent  in  rework  compared  to 
the  total  effort.  In  essence,  it  probably  provides 
the  best  indicator  of  productivity.  The  activities 
included  in  these  efforts  should  only  include  the 
technical  requirements,  software  engineering,  de¬ 
sign,  development,  and  functional  test.  Higher 
level  system  engineering,  management,  configura¬ 
tion  control,  verification  testing  and  higher  level 
system  testing  should  be  excluded  since  these  ac¬ 
tivities  tend  to  be  more  a  function  of  the  company, 
customer  or  project  attributes  independent  of  qual¬ 
ity.  The  goal  here  is  to  normalize  the  widely  varying 
bureaucratic  activities  out  of  the  metrics.  Rs  pro¬ 
vides  a  value  for  comparing  with  similar  projects, 
future  increments,  or  future  projects.  Basically,  it 
defines  the  proportion  of  the  product  which  had  to 
be  reworked  in  its  lifecycle. 

Modularity  This  value  identifies  the  average  SLOC 
broken  per  SCO  which  reflects  the  inherent  abil¬ 


ity  of  the  integrated  product  to  localize  the  im¬ 
pact  of  change.  To  the  maximum  extent  possible, 
QCBs  should  ensure  that  SCOs  are  written  for  sin¬ 
gle  source  changes. 

Changeability  This  value  provides  some  insight  into 
the  case  with  which  the  products  can  be  changed. 
While  a  low  number  of  changes  is  generally  a  good 
indicator  of  a  quality  process,  the  magnitude  of  ef¬ 
fort  per  change  is  sometimes  even  more  important. 

Maintainability  This  value  identifies  the  relative  cost 
of  maintaining  the  product  with  respect  to  its  de¬ 
velopment  cost.  For  example,  if  Re  =  Rs,  one 
could  conclude  that  the  cost  of  modification  is 
equivalent  to  the  cost  of  development  from  scratch 
(not  highly  maintainable).  A  value  of  Qm  much 
less  than  1  would  tend  to  indicate  a  very  main¬ 
tainable  product,  at  least  with  respect  to  develop¬ 
ment  cost.  Since  we  would  intuitively  expect  main¬ 
tenance  costs  of  a  product  to  be  proportional  to  its 
development  cost,  this  ratio  provides  a  fair  normal¬ 
ization  for  comparison  between  different  projects. 
Since  the  numerator  of  Qm  is  in  terms  of  effort 
and  its  denominator  is  in  terms  of  SLOC,  it  is  a  ra¬ 
tio  of  productivities  (i.e.,  effort  per  SLOC).  Some 
simple  mathematical  rearrangement  will  show  that 
Q/vr  is  equivalent  to: 

“  Productivity o* 

It  is  difficult  to  define  the  expectations  for  the  end- 
product  metrics  without  comparable  data  from  other 
projects.  Now  that  we  have  solid  data  for  CCPDS-R,  we 
can  form  expectations  for  future  increments  of  CCPDS- 
R  as  well  as  other  projects. 

The  above  descriptions  identify  idealized  trends  for 
these  metrics.  Undoubtedly,  real  project  situations  will 
not  be  ideal.  Their  differences  from  ideal,  however,  are 
important  for  management  and  customer  to  compre¬ 
hend.  Furthermore,  the  application  of  these  metrics  on 
project  increments  as  well  as  the  project  as  a  whole, 
should  be  useful. 

APPLICATION  RESULTS 

The  Command  Center  Processing  and  Display  Sys¬ 
tem  Replacement  (CCPDS-R)  project  will  provide  dis¬ 
play  Information  used  during  emergency  conferences  by 
the  National  Command  Authorities;  Chairman,  Joint 
Chiefs  of  Staff;  Commander  in  Chief  North  Ameri¬ 
can  Aerospace  Command;  Commander  in  Chief  United 
States  Space  Command;  Commander  in  Chief  Strategic 
Air  Command;  and  other  nuclear  capable  Commanders 
in  Chief.  It  is  the  missile  warning  element  of  the  new  In¬ 
tegrated  Tactical  Warning/ Attack  Assessment  System 


developed  hy  North  American  Aerospace  Defense  Com¬ 
mand/Air  Force  Space  'ommand. 

The  CCPDS-R  project  is  being  procured  by  Air 
Force  Systems  Command  Headquarters  Electronic  Sys¬ 
tems  Division  (ESD)  at  Hanscom  AFB  and  was  awarded 
to  TRW  Defense  Systems  Group  in  June  1987.  TRW 
will  build  three  subsystems.  The  first,  identified  as  the 
Common  Subsystem,  is  30  months  into  development. 
The  Common  Subsystem  consists  of  330,000  source  lines 
of  Ada  with  a  development  schedule  of  38  months.  It 
will  be  a  highly  reliable,  real-time  distributed  system 
with  a  sophisticated  User  Interface  and  stringent  per¬ 
formance  requirements  implemented  entirely  in  Ada. 
CCPDS-R  Ada  risks  were  originally  a  very  serious  con¬ 
cern.  At  the  time  of  contract  definition,  Ada  host  and 
target  environment,  along  with  Ada  trained  personnel 
availability  were  questionable. 

The  data  provided  in  this  paper  was  collected  by 
manually  analyzing  1500-1-  CCPDS-R  SCOs  maintained 
online  and  in  hard  copy  notebooks.  Most  of  the  data  de¬ 
fined  in  the  previous  section  was  available  in  the  SCOs. 
Each  problem  description  and  resolution  was  evaluated 
to  determine  whether  the  SCO  was  type  1  or  type  2  and 
w  h>nher  the  SCO  was  relevant  to  the  operational  prod¬ 
uct  (out  of  the  1500  SCOs,  910  were  relevant,  the  re¬ 
mainder  were  SCOs  for  initial  turnovers,  support  tools, 
test  software  or  commercial  software).  Furthermore, 
each  SCO  opened  contained  an,  estimate  of  the  effort 
to  fix  and  each  closed  SCO  provided  the  actual  (techni¬ 
cal)  effort  required  for  the  fix.  For  each  relevant  SCO, 
the  SLOC  breakage  estimate  was  based  on  experience 
with  the  fix,  the  detailed  description  of  the  resolution, 
the  hours  of  analysis  and  the  hours  required  for  im¬ 
plementing  the  fix.  Following  the  initial  definition  of 
these  metrics  the  actual  breakage  estimates  were  col¬ 
lected  more  rigorously.  While  not  perfectly  accurate  in 
all  cases,  these  estimates  are  at  least  consistent  relative 
to  each  other  and  given  the  large  sample  space,  rela¬ 
tively  accurate  for  the  intended  use.  Again,  it  is  not 
that  important  to  be  absolutely  exact  when  the  metrics 
and  trends  are  derived  from  a  large  sample  and  only 
useful  to  at  most  1  or  2  digits  of  accuracy. 

CCPDS-R  Common  Subsystem  Analysis 

Figure  3,  Figure  4  and  Table  4  provide  the  actual 
data  to  date  for  the  CCPDS-R  project.  The  follow¬ 
ing  paragraphs  discuss  the  quality  metrics  results  for 
the  CCPDS-R  common  subsystem  as  a  whole  with  con¬ 
clusions  drawn  where  applicable.  Figure  3  provides 
CCPDS-R  actuals  with  the  incremental  build  sequence 
(SLOCc)  overlayed  for  comparison. 

Configured  SLOC.  The  CCPDS-R  installments  of 
SLOCc  delivered  small  initial  builds  (AO/Al  and  A2) 
with  the  highest  risk  components.  The  middle  build 


(A3),  while  less  risky,  was  bulky  and  a  substantial  por¬ 
tion  of  the  build  was  produced  by  (somewhat  immature) 
automated  tools.  Nevertheless,  it  was  installed  in  two 
increments  (A31  and  A32). 

SCOs.  As  e.xpected,  the  SCO  rate  is  proportional 
to  the  SLOCc  rate.  The  actuals  also  suggest  that  the 
state  of  the  first  two  builds  was  higher  quality  at  delivery 
than  the  third  build.  The  feeling  of  the  development 
managers  on  the  project  concurs  with  this  assessment 
but  also  added  that  it  w-as  during  the  A3-A4  timeframe 
when  substantial  requirements  volatility  occurred  in  the 
user  interface  and  external  interface  definitions.  The 
number  of  open  SCOs  has  remained  fairly  constant  with 
respect  to  the  number  generated  and  hence  indicative 
that  the  rework  is  being  resolved  in  a  timely  fashion. 

Rewo.k  Resolution.  The  total  rework  (Ri  -c  Rz)  has 
also  grown  at  a  rate  proportional  to  SLOCc  growth  but 
its  rate  of  growth  is  decreasing.  Now  that  the  software 
is  all  configured  and  turnovers  are  complete,  breakage 
should  start  damping  out  rapidly.  The  lesolved  rework 
-f  tricked  the  total  rework  closely  with  little, 
if  an’  divergence.  The  last  three  months  indicate  that 
the  rate  of  resolution  is  exceeding  the  rate  of  breakage. 
This  should  indicate  to  the  management  team  that  no 
serious  problems  are  lurking  in  the  future. 

Rework  Ratio.  The  rework  rate  has  grown  from 
the  initial  builds  to  an  apparently  stable  value  of  .15. 
This  would  imply  that  the  initial  build  was  more  ma¬ 
ture  at  delivery  than  the  second  and  third  builds.  With 
over  98%  of  the  software  in  SLOCc,  this  value  should 
be  expected  to  be  fairly  stable  and  a  good  predictor  of 
future  rework.  The  amount  of  rework  backlog  in  pro¬ 
portion  to  SLOCc  has  remained  fairly  constant  and 
implies  that  the  divergence  of  breakage  rate  and  reso¬ 
lution  rate  should  correct  itself  shortly.  The  situation 
here  is  that  substantial  increments  are  being  added  to 
SLOCc  and  an  increase  in  breakage  vs  resolution  is  ex¬ 
pected  since  the  development  team  is  likely  focusing  on 
installing  baseline  components  rather  than  fixing  com¬ 
ponents. 

SCO  Effort  Distributions.  Figure  4  identifies  the 
distribution  of  SCOs  by  the  effort  required  for  resolu¬ 
tion.  This  graphic  also  suggests  that  the  software  is 
generally  easy  to  modify.  A  deeper  analysis  of  the  data 
shows  that  the  majority  of  complex  SCOs  occurred  in 
the  more  complex  early  builds. 

Rework  Proportions.  Re  (Table  4)  defines  the  per¬ 
cent  of  the  development  efforts  devoted  to  rework.  Since 
we  only  tracked  the  technical  effort  in  analyzing  and  im¬ 
plementing  resolutions,  we  have  compared  it  to  the  soft¬ 
ware  development  effort  devoted  to  the  same,  namely, 
the  requirements,  design,  development  and  test  effort. 
In  both  cases  we  eliminate  the  cost  of  management, 
facility,  secretarial,  configuration  management,  quality 
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Figure  3:  CCPDS-R  Collected  Statistics 
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Figure  4:  SCO  Effort  Distribution 


assurance,  and  other  level  of  effort  adnninistrative  ac¬ 
tivities.  Note  that  we  have  included  the  software  re¬ 
quirements  analysis  effort  since,  in  our  evolutionary  ap¬ 
proach,  there  is  only  a  subtle  difference  between  require¬ 
ments  and  design.  Rs  defines  the  percentage  of  source 
code  which  has  undergone  rework.  CCPDS-R  is  cur¬ 
rently  projecting  a  rework  ratio  of  14%  . 


Metric 

CCPDS-R  Value 

Rework  Proportions 

Re  =  6.7% 

Rs  =  13.5% 

Modularity 

_  enSLOC 

Wmod  — 

Changeability 

Qc  =  15.7^ 

Maintainability 

Qm  =  .49 

Table  4:  End-Product  Quality  Metrics  Definitions 

Modularity.  This  value  characterizes  the  extent  of 
damage  expected  for  the  average  SCO.  A  value  of  53 
SLOC  implies  that  the  average  SCO  only  affected  the 
equivalent  of  one  program  unit.  Since  most  of  the  trivial 
errors  get  caught  in  standalone  test  and  demonstration 
activities,  this  value  indicates  the  average  impact  for 


the  non-trivial  errors  which  creep  into  a  configuration 
baseline.  This  value  suggests  that  the  software  design  is 
flexible  but  with  no  basis  for  comparison,  this  is  purely 
conjecture.  An  additional  metric  which  would  be  useful 
in  assessing  modularity  would  be  the  number  of  files 
affected  per  change.  This  would  provide  insight  into  the 
locality  of  change  as  well  as  the  extent.  This  information 
was  not  available  in  the  CCPDS-R  historical  data,  but 
it  is  being  collected  in  future  data. 

Changeability.  The  average  effort  per  SCO  provides 
a  mechanism  for  comparing  the  complexities  of  change. 
As  a  project  average,  16  hours  suggests  that  change  is 
fairly  simple.  When  change  is  simple,  a  project  is 
likely  to  increase  the  amount  of  change  thereby 
increasing  the  inherent  qusdity. 

Rework  Improvement.  Figure  5  identifies  how  the 
changeability  {Qc)  evolved  over  the  project  schedule 
to  date.  While  conventional  experience  is  that  changes 
get  more  expensive  with  time,  CCPDS-R  demonstrates 
that  the  cost  per  change  improves  with  time.  This  is 
consistent  with  the  goals  of  an  evolutionary  develop¬ 
ment  approach  [12]  and  the  promises  of  a  good  layered 
architecture  [13]  where  the  early  investment  in  the  foun¬ 
dation  components  and  high  risk  components  pays  off 
in  the  remainder  of  the  life  cycle  with  increased  ease  of 
change.  The  trend  of  this  metric  would  indicate  that 
the  CCPDS-R  software  design  has  succeeded  in  pro¬ 
viding  an  integrable  component  set  with  effective  con¬ 
trol  of  breakage.  Had  the  trend  of  this  metric  showed 
growth  in  effort  per  SCO  without  stabilization,  manage¬ 
ment  may  be  concerned  about  the  design  quality  and 
downstream  risks  in  reworking  an  increasingly  hard  to 
change  product.  Note  that  Qc  metrics  do  not  include 
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the  cost  of  downstream  re-verification  of  higher  level 
requirements  since  the  broad  range  of  these  activities 
would  corrupt  the  intent  of  the  metric.  Qc  has  been 
purposely  defined  to  reflect  the  technical  risk  of  change, 
not  the  cost  of  reverification  in  a  larger  context  or  the 
management  risk.  For  example,  a  late  change  of  minor 
comple.xity  could  result  in  regression  test  by  inspection 
or  a  complete  reverification  of  numerous  performance 
threads.  This  range  of  effort  varies  with  the  context  of 
the  change,  the  customer/contractor  paranoia  and  a  va¬ 
riety  of  other  issues  which  are  not  reflective  of  the  ease 
of  change.  The  technical  cost  of  change  is  not  closed 
out  however,  until  this  reverification  is  complete  since 
it  may  result  in  reconsideration. 

Maintainability.  The  ratio  of  Re  to  Rs  character¬ 
izes  the  cost  of  reworking  CCPDS-R  components  com¬ 
pared  to  developing  them  from  scratch.  This  value 
along  with  the  change  traffic  experienced  during  the 
last  phase  of  the  life  cycle  could  be  used  to  predict  the 
maintenance  productivity  expected  from  the  current  de¬ 
velopment  productivity  being  experienced.  The  overall 
change  traffic  during  development  should  not  be  used  to 
predict  operational  maintenance  since  it  is  overly  biased 
by  immature  product  changes.  The  FQT  phase  change 
traffic  (likely  a  lower  value  than  the  complete  develop¬ 
ment  lifecycle  traffic),  is  a  more  accurate  measure.  A 
value  of  .49  seems  like  a  good  maintainability  rating, 
but  further  project  data  would  permit  a  better  basis  for 
assessment. 

This  value  requires  some  caveats  in  its  usage.  First, 
this  maintenance  productivity  was  derived  from  small 
scale  maintenance  actions  (fixes  and  enhancements)  as 
opposed  to  large  scale  upgrades  where  system  engi¬ 
neering  and  broad  redesign  may  be  necessitated.  The 
personnel  performing  the  maintenance  actions  how¬ 
ever,  were  knowledgeable  developers  which  may  bias  the 
maintainability  compared  to  the  expertise  of  the  mainte¬ 
nance  team.  This  data,  like  any  productivity  data,  must 
be  used  carefully  by  people  cognizant  with  its  derivation 
to  ensure  proper  usage. 

Functional  CSCI  Analysis.  A  complete  lower  level 


analysis  was  performed  to  analyze  the  various  contribu¬ 
tions  to  the  values  in  Table  4  by  the  individual  CSCIs. 
While  the  evaluation  of  this  lower  level  data  will  not  be 
discussed  here  in  detail,  they  did  uncover  some  inter¬ 
esting  phenomena  which  have  since  been  incorporated 
into  the  plans  of  future  subsystems.  There  were  signif¬ 
icant  differences  in  the  various  CSCI  level  values  which 
provided  insight  into  various  levels  of  quality  and  the 
need  for  perturbations  to  future  plans.  The  Qm  varied 
from  .12  to  .85  across  6  CSCIs.  For  example,  relatively 
low  values  were  observed  for  algorithm  (.12)  and  display 
(.27)  software  where  ease  of  change  was  a  clear  design 
goal.  Higher  values  were  observed  for  the  external  com¬ 
munications  software  (.51)  and  system  services  software 
(.85)  where  changes  in  an  external  message  set  for  exam¬ 
ple,  could  result  in  broader  system  impacts.  The  range 
of  values  clearly  identifies  the  relative  difference  in  risk 
associated  with  changing  various  aspects  of  the  design. 
The  absolute  risk  associated  with  these  changes  is  dif¬ 
ficult  to  assess  without  further  data  from  other  similar 
projects. 

Global  Summary.  In  general,  the  CCPDS-R  pro¬ 
gram  appears  to  be  converging  towards  a  very  high  qual¬ 
ity  product  with  high  probability.  This  assessment  is 
implied  from  the  visible  stability  in  the  quality  metrics. 
The  fact  that  these  metrics  are  stable  generally  implies 
that  the  remaining  efforts  are  predictable.  If  the  pre¬ 
dictions  do  not  extrapolate  to  better  than  required  per¬ 
formance,  action  can  be  taken.  The  key  to  optimizing 
the  value  of  these  metrics  is  to  achieve  stabilization  as 
early  as  possible  so  that  if  predicted  performance  does 
not  match  expectations,  management  can  instigate  im¬ 
provement  actions  as  early  in  the  life  cycle  as  possible. 
Some  characteristics  of  CCPDS-R  which  are  important 
to  keep  in  mind  when  interpreting  the  above  metrics 
include: 

1.  Many  changes  incurred  by  the  project  were  really 
type  3  (true  requirements  change).  However,  since 
most  of  these  were  small  it  was  easier  to  incorpo¬ 
rate  them  rather  than  go  through  the  formal  ECP 


process.  In  retrospect,  the  sum  of  all  these  little 
changes  was  quite  substantial. 

2.  These  metrics  are  derived  from  the  development 
phase,  comparison  with  other  project’s  mainte¬ 
nance  phase  metrics  is  misleading.  The  metrics 
available  in  the  final  3  months  prior  to  delivery  (as 
opposed  to  the  lifecycle  averages  presented  here) 
however,  should  be  fairly  comparable. 

Operational  Concept.  The  concept  of  operations  for 
the  software  quality  metrics  program  is  to  provide  in¬ 
sight  for  the  purposes  of  managing  product  development 
with  minimum  interference  to  the  development  team. 
This  will  be  accomplished  by  integrating  the  standards 
for  metrics  collection  into  the  tools  and  QCB  proce¬ 
dures.  The  responsibilities  of  this  initiative  are  allocated 
as  follows: 

Software  Developers:  follow  the  core  Ada  De- 

sign/Developmcnt  Standards 

Software  Development  Managers:  Follow  the 

evolutionary  process  model,  adhere  to  core  software 
quality  metrics  policy,  coordinate  with  project  sys¬ 
tems  effectiveness  any  project  unique  policies,  in¬ 
terpret  systems  effectiveness  SQM  analysis  and  be 
accountable  for  issues  and  re:jolutions. 

Corporate  Systems  Effectiveness: 

Define  the  SQM  policy/tools/procedures,  evalu¬ 
ate  project  implementations,  improve  the  poli¬ 
cies/tools/procedures  and  ensure  consistent  usage 
across  different  projects.  This  is  the  same  function 
proposed  by  [8]  as  the  standards  group. 

Project  Software  Engineering:  Flowdown  the 

SQM  policy/tools/procedures  into  a  project  im¬ 
plementation,  implement  project  QCB,  SQM  col¬ 
lection,  SQM  analysis,  SQM  reporting,  evaluate 
project  implementations,  and  propose  candidate 
improvements  to  the  policies/tools/procedures. 
Note  that  we  are  putting  this  function  in  the  hands 
of  knowledgeable  project  personnel  (as  opposed  to 
conventional  independent  QA  personnel)  since  the 
administrators  of  these  metrics  should  be  moti¬ 
vated  for  effective  use  through  ownership  in  both 
the  process  and  the  products. 

We  would  foresee  SQ.M  metrics  reporting  on  .  a 
monthly  or  quarterly  basis  depending  on  project  phase, 
size,  risks,  etc.  Furthermore,  the  entire  SQM  initiative 
should  be  relatively  dynamic  during  its  infancy  as  real 
project  applications  determine  what  is  most  useful  and 
feedback  is  incorporated. 


SUMMARY 

By  itself,  CCPDS-R  is  perhaps  a  bad  example  for 
testing  these  metrics.  In  general,  the  project  has  per¬ 
formed  as  planned  and  has  a  high  probability  of  deliver¬ 
ing  a  quality  product.  It  would  be  useful  to  examine  a 
less  successful  project  to  illustrate  the  tendencies  which 
every  project  manager  should  be  looking  for  as  indica¬ 
tors  of  trouble  ahead. 

None  of  these  metrics  by  themselves,  provides 
enough  data  to  make  an  assessment  of  a  project’s  qual¬ 
ity.  They  must  be  examined  as  a  group  in  conjunction 
with  other  conventional  measures  to  arrive  at  an  accu¬ 
rate  assessment.  They  also  do  not  represent  the  only 
set  of  useful  metrics  possible  from  the  collected  statis¬ 
tic  on  SCOs  and  rework.  There  are  many  other  ways 
to  examine  this  data  and  present  it  for  trend  analysis. 
With  further  automation,  these  other  views  would  be 
simple  to  produce.  The  following  activities  still  need  to 
be  performed  to  provide  a  complete  initiative; 

1.  Enhance  the  standard  SCO  form  with  definitions, 
standards  and  procedures  for  usage. 

2.  Enforce  a  single,  portable  SLOC  Counting  Tool 

3.  Identify  Ada  standards  (which  would  be  manda¬ 
tory  across  all  Ada  projects)  necessary  to  guaran¬ 
tee  consistent  metrics  collection  across  projects  and 
within  projects.  This  primarily  involves  standards 
for  program  unit  headers  and  program  layout  which 
are  not  controversial. 

4.  Develop  an  SCO  data  base  management  system 
with  supporting  tools  for  automated  collection, 
analysis  and  reporting  in  the  formats  defined  above 
and  other,  as  yet  undiscovered,  useful  formats. 

5.  Define  QCB  procedures,  guidelines  for  metrics 
analysis  and  candidate  reporting  formats. 

6.  Incorporate  this  initiative  into  corporate  policy. 

As  a  conclusion,  we  should  evaluate  the  approach 
presented  herein  with  our  original  goals: 

1.  Simplicity.  The  number  of  statistics  to  be  main¬ 
tained  in  an  SCO  database  to  implement  this  ap¬ 
proach  is  5  (type,  estimate  of  damage  in  hours 
and  SLOC,  actual  hours  and  actual  SLOC  to  re¬ 
solve)  along  with  the  other  required  parameters 
of  an  SCO.  Furthermore,  metrics  for  SLOCc  and 
SLOCt  need  to  be  accurately  maintained.  If  auto¬ 
mated  in  an  online  DBMS,  the  remaining  metrics 
could  be  computed  from  various  perspectives  (e.g., 
by  build,  by  CSCI)  in  a  straightforward  manner. 
Depending  on  the  extent  of  discipline  already  inher¬ 
ent  in  a  project’s  CCB  and  development  metrics. 


the  above  effort  could  be  viewed  as  simple  (as  in 
the  case  of  CCPDS-R)  to  complex  (undisciplined, 
management  by  conjecture  projects). 

2.  Ease  of  Use.  The  metrics  described  herein  were 
easy  to  use  by  CCPDS-R  project  personnel  famil¬ 
iar  with  the  project  context.  Furthermore,  they 
provide  an  objective  basis  for  discussing  current 
trends  and  future  plans  with  outside  authorities 
and  customers.  Most  trends  are  obvious  and  easily 
explained.  Some  trends  require  further  analysis  to 
understand  the  underlying  subtleties.  End-product 
metrics  provide  simple  to  understand  indicators  of 
different  software  quality  aspects  for  the  purposes 
of  comparison  and  future  planning  as  well  as  as¬ 
sessment  of  process  improvement. 

3.  Probability  of  Misuse  There  are  enough  perspec¬ 
tives  that  provide  somewhat  redundant  views  so 
that  misuse  should  be  minimized.  Without  further 
experience,  however,  it  is  not  clear  that  contrac¬ 
tor  and  customer  will  always  interpret  them  cor¬ 
rectly.  Although  correct  interpretation  could  never 
be  guaranteed,  it  would  be  beneficial  to  obtain 
more  experience  to  evaluate  where  misinterpreta¬ 
tion  is  most  likely. 
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